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Abstract 

In many real-world situations, different and often opposite opin- 
ions, innovations, or products are competing with one another for 
their social influence in a networked society. In this paper, we 
study competitive influence propagation in social networks under 
the competitive linear threshold (CLT) model, an extension to the 
classic linear threshold model. Under the CLT model, we focus on 
the problem that one entity tries to block the influence propagation 
of its competing entity as much as possible by strategically select- 
ing a number of seed nodes that could initiate its own influence 
propagation. We call this problem the influence blocking maxi- 
mization (IBM) problem. We prove that the objective function of 
IBM in the CLT model is submodular, and thus a greedy algorithm 
could achieve 1 — 1/e approximation ratio. However, the greedy al- 
gorithm requires Monte-Carlo simulations of competitive influence 
propagation, which makes the algorithm not efficient. We design 
an efficient algorithm CLDAG, which utilizes the properties of the 
CLT model, to address this issue. We conduct extensive simulations 
of CLDAG, the greedy algorithm, and other baseline algorithms on 
real-world and synthetic datasets. Our results show that CLDAG is 
able to provide best accuracy in par with the greedy algorithm and 
often better than other algorithms, while it is two orders of magni- 
tude faster than the greedy algorithm. 

Keywords: influence blocking maximization, competitive linear 
threshold model, social networks 

1 Introduction 

With the increasing popularity of online social and infor- 
mation networks such as Facebook, Twitter, Linkedln, etc., 
many researchers have studied diffusion phenomenon in so- 
cial networks, which includes the diffusion of news, ideas, 
innovations, adoption of new products, etc. We generally re- 
fer to such diffusions as influence diffusion or propagation. 
One topic in influence diffusion that has been extensively 
studied is influence maximization fT4l[T5l[T6l[T9l l6ll5l [251171. 
Influence maximization is the problem of selecting a small 
set of seed nodes in a social network, such that its overall in- 
fluence coverage is maximized, under certain influence dif- 
fusion models. Popular influence diffusion models include 
the independent cascade (IC) model and the linear thresh- 



old (LT) model, which was first summarized by Kempe et al. 
in 1741 based on prior research in social network analysis and 
particle physics. Both IC and LT models are stochastic mod- 
els characterizing how influence are propagated throughout 
the network starting from the initial seed nodes. 

However, all of the above research works only study the 
diffusion of a single idea in the social networks. In reality, 
it is often the case that different and often opposite informa- 
tion, ideas and innovations are competing for their influence 
in the social networks. Such competing influence diffusion 
could range from two competing companies engaging in two 
marketing campaigns trying to grab people's attentions, or 
two political candidates of the opposing parties trying to in- 
fluence their voters, to government authorities trying to in- 
ject truth information to fight with rumors spreading in the 
public, and so on. 

Motivated by the above scenarios, several recent studies 
have looked into competitive influence diffusion and its cor- 
responding influence maximization problems (T] [TTl [2TJ |24l 
El El SI . Most of them propose some extensions to the exist- 
ing influence diffusion models to incorporate competitive in- 
fluence diffusion, and then either focus on the influence max- 
imization problem for one of the competing parties, or study 
the game theoretic aspects of competitive influence diffusion 
(see Section [2] for more details on these related works). In 
this paper, we concentrate on the problem of how to block 
the influence diffusion of an opposing party as much as pos- 
sible. For example, when there is a negative rumor spread- 
ing in the social network about a company, the company may 
want to react quickly by selecting seed nodes to inject posi- 
tive opinions about the company to fight against the negative 
rumor. Similar situations could occur when a political candi- 
date tries to stop a negative rumor about him or her, or when 
government or public officials try to stop erroneous rumors 



about public health and safety, terrorist threat, etc. We call 
the problem of selecting positive seed nodes in a social net- 
work to minimize the effect of negative influence diffusion, 
or to maximize the blocking effect on negative influence, the 
influence blocking maximization (IBM) problem. 

We study the IBM problem under a competitive linear 
threshold (CLT) model, which we extend naturally from the 
classic linear threshold model and is similar to a model pro- 
posed independently in [2 ]. We prove that the objective func- 
tion of IBM under the CLT model is monotone and submod- 
ular, which means a standard greedy algorithm can achieve 
an approximation ratio of 1 — 1 je — e to the optimal solu- 
tion, where e is any positive number. However, the greedy 
algorithm requires Monte-Carlo simulations of competitive 
influence diffusion, which becomes very slow for large net- 
works, if we want to keep e above small. For example, in 
our simulation, for a network with 6.4k nodes, the greedy 
algorithm takes more than 8 hours to finish. This is espe- 
cially problematic for the IBM problem, since blocking in- 
fluence diffusion usually requires very swift decisions before 
the negative influence propagates too far. To address the effi- 
ciency issue, we utilize the efficient computation property of 
the LT model for directed acyclic graphs (DAGs), and design 
an efficient heuristic CLDAG for the IBM problem under the 
CLT model. Because of the complex interaction in the com- 
petitive influence diffusion under the CLT model, we need 
a carefully designed dynamic programming method for in- 
fluence computation in our CLDAG algorithm. To test the 
efficiency and effectiveness of our CLDAG algorithm, we 
conduct extensive simulations on three real- world networks 
as well as synthetic networks. We compare the performance 
of CLDAG with the greedy algorithm and other heuristic 
algorithms. Our results show that (a) comparing with the 
greedy algorithm, our CLDAG algorithm achieves matching 
influence blocking effect while it runs two orders of magni- 
tude faster; and (b) comparing with other heuristics such as 
degree-based heuristics, our algorithm consistently performs 
well and is often better than the other heuristics with a sig- 
nificant margin. 

To the best of our knowledge, our work is the first 
to study the IBM problem under the competitive linear 
threshold model. The study closest to our work is the one 
in 0, but they study the IBM problem under an extension 
of the independent cascade model, and due to the issue of 
non-submodularity, their study only works for a restricted 
extention to the IC model that is less natural. Moreover, their 
work does not address the efficiency issue, which is vital to 
influence blocking maximization. 

The rest of the paper is organized as follows. We discuss 
related works in Section [2] In Section [3] we specify the 
competitive linear threshold model. In Section]?] we define 
the influence blocking maximization problem, show that it is 
NP-hard, and prove its submodularity under the CLT model. 



We describe our CLDAG algorithm in Section [5] and then 
provide our experimental evaluation results in Section [6] We 
conclude the paper with discussions in Section [7] 

2 Related Work 

Independent cascade model and linear threshold model are 
two extensively studied influence diffusions models origi- 
nally summarized by Kempe et al. |14], based on earlier 
works of lfTTll23l[T0ll. Kempe et al. prove that the gen- 
eralized versions of these two models are equivalent fl4l . 
Based on the IC and LT model, Kempe et.al [Q31 [T3 pro- 
pose a greedy algorithm to solve the influence maximization 
problem (brought about by Richardson l22l ) to maximize the 
spreading of a single piece of ideas, innovations, etc. under 
these two models. Many follow-up studies propose alter- 
native heuristics and try to solve the influence maximization 
problem more efficiently (161 [191 [7l ESI . In terms of effi- 
cient algorithm design, our work follows the idea in (HIT) of 
finding efficient local graph structures to speed up the com- 
putation. In particular, our CLDAG algorithm is similar to 
the LDAG algorithm of |7], which is also based on the DAG 
structure, but our CLDAG algorithm is novel in dealing with 
competitive influence diffusion using the dynamic program- 
ming method. 

Recently, there are a number of studies on competitive 
influence diffusion d El E] EH E H. Bharathi et al, 
extend the IC model to model competitive influence 1 1 ], but 
they only provide a polynomial approximation algorithm for 
trees. Kostka et al. study competitive rumor spreading ifTTl 
on a more restricted model than IC and LT, and focused on 
showing the hardness of computing the optimal solution for 
the two competing parties. Pathak et al. study a model of 
multiple cascades |21 ], which is an extension of a different 
diffusion model called the voter model (Sl[T3l, even though 
they claim it to be a generalization of the linear threshold 
model. They only study model dynamics and do not address 
the influence maximization problem. Trpevski et al. l24l 
propose another competitive rumor spreading model based 
on the epidemic model of SIS and study the dynamics in 
several classes of graphs, and they do not address the issue 
of influence maximization either. Borodin et.al (2) extend 
the LT model in several different ways to model competitive 
influence diffusion, one of which is essentially our CLT 
model except for a different tie-breaking rule. However, 
they only study the influence maximization problem, not 
the influence blocking maximization. In particular, they 
show that influence maximization in the CLT model is not 
submodular, which is an interesting contrast to our result 
that influence blocking maximization under the CLT model 
is submodular. We provide some reason in Section [7] on why 
there is such a subtle difference. The work of Budak et al. 
is the only one we found that studies influence blocking 
maximization (they call it eventual influence limitation), 



but they study this problem under an extension of the IC 
model. They show that the general extension of the IC 
model in which positive influence and negative influence has 
a separate set of parameters (same as the case in our CLT 
model) is not submodular, and thus to achieve submodularity 
they have to restrict the model such that positive propagation 
probability is 1 or is the same as negative propagation 
probability, which limits the expressiveness of the model. 
Moreover, they only study the greedy algorithm and some 
simple heuristics, and do not provide efficient and scalable 
solution that maintains good accuracy at the same time. 
Finally the work of (4) studies negative opinions emerging 
from poor product or service qualities, not generated by 
competitors. They study positive influence maximization 
under an extension to the IC model, and thus different from 
our study on blocking negative influence under the extension 
of the LT model. The efficient influence maximization 
algorithm in [4] also uses dynamic programming, which 
bears some resemblance to our work. 

3 Competitive Linear Threshold Model 

Kempe et al. proposed the linear threshold model in lITHl 
as a stochastic model to address information cascade in a 
network. In this model, a social network is considered 
as a directed graph G = (V, E), where V is the set 
of vertices representing individuals and E is the set of 
directed edges representing influence relationships among 
individuals. Each edge (u,v) G E has a weight w uv > 
0, indicating the importance of u in influencing v. For 
convenience, for any (u,v) E, w uv = 0. For each 
v G V, we have ^2 ueV w uv < 1. Each vertex v picks an 
independent threshold V uniformly at random from [0,1]. 
Each vertex is either inactive or active, and once it is active, it 
stays active forever. The diffusion process unfolds in discrete 
time steps. At step a seed set S C V is activated while all 
other vertices are inactive. At any later step t > 0, a vertex 
v is activated if and only if the total weight of its active in- 
neighbors exceeds its threshold V , that is ^2 ueS w uv > 
V , where S t -i C V is the set of active vertices by time t— 1, 
with So S. 

We now extend the LT model to incorporate competitive 
influence diffusion. The idea is that we allow each vertex 
to be positively activated or negatively activated, each of 
which is determined by concurrent positive diffusion and 
negative diffusion, respectively. In the case that a vertex is 
both positively activated and negatively activated in the same 
step, then negative activation dominates the result. 

More precisely, we define competitive linear threshold 
(CLT) model as an extension to the LT model in the follow- 
ing way. Each vertex has three states, inactive, + active, and 
-active, and it does not change state once it becomes +ac- 
tive or -active. Each edge (u, v) has two weights, positive 
weight w+ v and negative weight w~ v . We can also think of 



it as (u,v) splitting into two virtual edges, one positive edge 
propagating positive influence and one negative edge propa- 
gating negative influence. Each vertex v picks two indepen- 
dent thresholds uniformly at random from [0,1], one positive 
threshold 0+ and one negative threshold 6~ . At step 0, there 
are two disjoint seed sets, the positive seed set Pq and the 
negative seed set No. At each step t, positive influence and 
negative influence propagate independently as in the origi- 
nal LT model, using positive weights/thresholds and negative 
weights/thresholds, respectively. If a vertex v is activated 
only by positive diffusion (or resp. negative diffusion), then 
v becomes +active (or resp. -active). If in step t v is activated 
by both positive diffusion and negative diffusion, then nega- 
tive diffusion dominates and v becomes -active. The negative 
dominance rule reflects the negativity bias phenomenon well 
studied in social psychology, and matches the common sense 
that rumors are usually hard to fight with. 

The CLT model defined here is essentially the same as 
the separate threshold model of |2], except that we use the 
negative dominance as the tie-breaking rule, while they use 
the random rule — +active and -active status are picked 
uniformly at random. We comment that the difference in 
the tie-breaking rule is not essential for our study: the 
submodularity property still holds and our algorithm can be 
properly adapted for the random tie-breaking rule. 

4 Influence Blocking Maximization Problem 

In this section, we first define the influence blocking maxi- 
mization (IBM) problem, then show that IBM under the CLT 
model is NP-hard, and finally prove that the objective func- 
tion of IBM is monotone and submodular, which leads to a 
greedy approximation algorithm. 

4.1 Problem definition. Informally, the IBM problem is 
an optimization problem in which given a graph G = (V,E), 
its positive and negative edge weights, a negative seed set 
No, and a positive integer k, we want to find a positive 
seed set S of size at most k such that the expected number 
of negatively activated nodes is minimized, or equivalently, 
the reduction in the number of negatively activated nodes is 
maximized. 

More precisely, let + and 0~ be the vector of positive 
thresholds and negative thresholds, respectively, for all ver- 
tices in G. According to the CLT model, they are drawn from 
the probability space [0, l]' y ' uniformly at random. When 
+ and 6~ are fixed, all randomness in the CLT model is 
fixed. Let IBS(S, AT | 0+, 0-) be the set of nodes v in G 
such that under thresholds 6 + and 6~ , v is negatively acti- 
vated if No is the negative seed set and positive seed set is 
empty, while v is not negatively activated if No is the nega- 
tive seed set and S is the positive seed set. Thus this set rep- 
resents the set of nodes that have been blocked from negative 
influence, and IBS stands for influence blocking set. Since 



we always use No as the negative seed set, we will omit No 
from the notation for simplicity. When the context is clear, 
we may also omit 6 + and 9~ and only use IBS(S) to repre- 
sent the influence blocking set. We define negative influence 
reduction (NIR) of a positive seed set S, denoted as ctnir (S), 
to be the expected value of the size of IBS(S | # + , 6~), with 
expectation taken over all # + 's and 0~ 's, that is, 

a NIR (S) = E S+ S _{\IBS{S | 

Then the influence blocking maximization is the problem 
of finding a positive seed set S of size at most k that 
maximizes ctnir(S), i.e., computing 

P* = arg max a NIR (S). 
\S\<k 

We first show that the exact problem of IBM is NP-hard. 

THEOREM 4.1. Under the CLT model, IBM problem is NP- 
hard. 

Proof. By a reduction from the vertex cover problem. The 
full NP-hardness proof in presented in Appendix. □ 

4.2 Submodularity of ctnir(S) and the greedy approx- 
imation algorithm. To overcome the NP-hardness result of 
Theorem |4.1| we look for approximation algorithms. The 



submodularity of set function ctnir(S) provides a good way 
to obtain an apporiximation algorithm for the IBM prob- 
lem. We say that a set function f(S) with domain 2 V is 
submodular if for all S C T C V, and x T, we have 
f(S U {x}) - f(S) > f(T U {x}) - f(T). Intuitively, 
submodularity of / means / has the diminishing marginal 
return property. Moreover, we say that / is monotone if for 

anscTcy,/(s) <f(T). 

We now show that ctnir(S) is monotone and submod- 
ular. We follow the general methodology as in lfT4l for the 
proof, but our proof is more involved because of the com- 
plexity of our CLT model and the IBM problem. We first 
construct an equivalent random process, and then use this 
random process to prove the result. 

From the original graph G — (V,E) with positive and 
negative weights, we construct a random live-path graph Gx 
as follows. For each v G V, we randomly pick one positive 
in-edge (u,v) with probability w+ v , and with probability 
1 — Euev w uv no positive in-edge is selected; we also 
randomly pick one negative in-edge (u,v) with probability 
w~ v , and with probability 1 — E n ev w~ v no negative in-edge 
is selected. Let G + be the subgraph of Gx consisting of only 
positive edges, and let G~ be the subgraph of Gx consisting 
of only negative edges. Given a positive seed set Po and a 
negative seed set 7V , define d G +(Po,v) to be the shortest 
graph distance from any node in Po to v only through the 
positive edges, and d G -(No,v) to be the shortest graph 



distance from any node in No to v only through the negative 
edges. The above distance could be oo if no such path exists. 
Then in the random live-path graph, we say a node v is 
+active if d G + (Po, v) < oo and d G + (Po, v) < d G - (iVo, v), 
and v is -active if d G -(No,v) < oo and dG-(No,v) < 
d G +(Po,v). The following lemma shows that the positive 
and negative activation sets generated by the above random 
process is equivalent to the corresponding one generated by 
the CLT model. 

Lemma 4.1. For a given positive seed set Po and negative 
seed set No, the distribution over +active sets and -active 
sets is identical in the following two definitions. 

1. distribution obtained by running CLT process, 

2. distribution obtained from reachability defined above in 
the live-path graph. 

Proof. The activation process under the CLT model consists 
of several iterations. In each iteration, some nodes change 
from inactive to + active or -active. Thus we first define 
to be the set of + active nodes at the end of iteration t and 
as the set of -active nodes at the end of iteration t, for 
t = 0, 1, 2.... Here we consider a node v which has not been 
activated by the end of iteration t, namely v A^ U A^ . 
Thus the probability v becomes + active in iteration t + 1 
equals to the chance that the positive influence weights 
in Af\A'^_ 1 push it over the positive threshold while the 
negative influence weights is still less than the negative 
threshold. The above probability under the condition that 
neither node v's negative nor positive threshold is exceeded 
already by step t is: 
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Similarly we can get the probability that node v becomes 
-active in iteration t + 1 given than node v is inactive from 
iteration to t. The probability is: 
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On the other hand, we consider the above discussed 
probability when using the random live-path graph. We start 
from seed set Po and Nq and called them Bq and Bq , 
respectively. For each t = 1,2,..., we define B^ to be 
the set containing any v B^~_ t U B^_ x such that v has one 
in-edge from some node in B^_ x ; we define B^ to be the set 
containing any v B^_ x U B^_ x such that v has one in-edge 
from some node in B^_ x but no in-edge from any node in 

By the definition of the random live-path graph, the 
probability that a node v is in B^_ 1 \ Bf conditioned on 



that v is not in Bf U B t is 

(Yj U eB+\B+_ 1 w uv)0- ~ Yj U eB-\B~_ 1 W uv) 

^-Y,ueBt_ 1 w ^ l -Y,ueB;_ 1 w uv) 

Similarly, the probability that a node v is in B^ +1 \ B~ t 
conditioned on that v is not in B^ U is 



^-YsueB-^uv) 



The above conditional probabilities are the same as 
derived from the CLT model. Since Aq = Bq and 
Aq = Bq , by induction over the iterations, we reach at the 
conclusion that the random live-path graph model produces 
the same distribution over + active and -active sets as the CLT 
model. □ 



With the equivalence shown in Lemma 4.1 



we now 

focus on showing the monotonicity and submodularity of 
negative influence reduction in the random live-path graph 
model. With a bit of abuse in notation, given a live-path 
graph Gx and a negative seed set No, we also use IBS(S) 
to denote the set of nodes in V which would be -active 
if the positive seed set is empty but is not -active if the 
positive seed set is S. Then the negative influence reduction 
<tnir(S) = E Gx (\IBS(S)\). 

Given a set S and a node u S, we say that there is 
a unique path from S to u if there exists some path from a 
node in S to u, and for any two paths from any two nodes 
in S to u, one path must be a sub-path of the other. In 
addition, whenever we refer to the unique path from S to u, 
we mean the unique shortest path from any node in S to u. 
The following lemma shows a simple yet important property 
of the live-path graph that leads to the submodularity proof. 

LEMMA 4.2. In a live-path graph Gx, for any node v, there 
is a unique positive path from some node in the positive seed 
set S to v, ifdQ+ (S, v) < oo, and there is a unique negative 
path from some node in the negative seed set No to v, if 

d G -(N ,v) < oo. 

Proof This is obvious because each node has at most one 
positive in-edge and one negative in-edge. □ 

Then we use next two lemmas to give the sufficient and 
necessary conditions for v G IBS(S) and v G IBS(T U 
{u})\IBS{T) in a live-path graph G x . 

Lemma 4.3. The sufficient and necessary condition for v G 
IBS(S)is: 

1. There exist a unique negative path in G~ from node set 
Nq to v, namely d G - (7V , v) < oo, and 



2. there exists at least one node u in the unique negative 
path, such that d G + (S, u) < d G - (No, u). 

Proof Lemma [43] is an obvious derivation from the defini- 
tion of IBS(S) and Lemmapl □ 



Lemma 4.4. The sufficient and necessary condition for u G 
IBS(TU{v})\IBS(T)is: 

1. There exists a unique negative path from Nq to u, 

2. there exists at least one node w on the unique negative 
path from 7V to u, such that d G +(T U {v},w) < 
d G - (TVq, w), and 

3. for all node t on the unique negative path from No to u, 
there holds that d G + (T, t) > d G - (No, t). 



Proof The above conditions 1 and 2 are direct conclusions 
on u G IBS(T U {v}). Condition 3 is the 
43\onu IBS(T). □ 
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of Lemma 



direct derivation of Lemma 



LEMMA 4.5. The cardinality set function \IBS(S)\ for a 
live-path graph Gx is monotone and submodular. 

Proof We first prove the monotonicity of | IBS(S) \ , namely 
for any node u G V\(S U N ) and subset S C V, 
\IBS(S)\ < \IBS(S U {u})\. We prove the result by 
showing that IBS(S) C IBS(S U {u}). Consider any node 
v G IBS(S). By Lemma [43] we have d G -(N ,v) < oo, 
and there exists a node w in the unique negative path from 
7V to v such that d G +(S,w) < d G -(No,w). It is also 
clear that d G +(S U {u},w) < d G +(S,w). Thus, we have 
d G +(S U {u}, w) < d G - (No, w), and by Lemma 4.3 v G 
IBS(SU{u}). 

We then prove submodularity of \IBS(S)\ by showing: 
For any subset S C V, T C V, S C T and v G V\(T U N ), 

IBS(TU{v})\IBS(T) C IBS(SU{v})\IBS(S). 

Given any u G IBS(T U {v})\IBS(T), we prove that 
u G IBS(S U {v})\IBS(S) by showing all three conditions 
in Lemma \4~4\ are satisfied. The satisfaction of 1 is obvious, 
since d G -(No,u) doesn't change. As for condition 2, we 
know that there exists a node w on the unique negative path 
from TVo to u, d G + (T U {v}, w) < d G - (No,w) and for all 
node t on path from N to u, d G + (T, t) > d G - (N , t). Then 
for node w, d G + (T U {v}, w) < d G - (N , w) < d G + (T, w), 
which implies that d G + (T U {v},w) = d G + (v,w). Accord- 



ing to Lemma 4.2 the positive influence can reach node w 
only in the unique positive path from vtow. Thus d G + (S U 
{v}, w) = d G + (v, w) = d G + (T U {v}, w) < d G - (N , w). 
Then consider condition 3. For any node t in the unique neg- 
ative path from to u, d G +(T,t) > d G -(No,t). Since 
S C T, it is easy to verify that d G +(S,t) > d G +(T,t). 
Therefore, d G +(S,t) > d G -(No,t) and condition 3 also 
holds. □ 



Algorithm 1 Greedy(/c,7Vo) 



1: initialize 5 = 
2: for i = 1 to & do 

3: selects = ar^max vG y\ (iVoU(S) (cr7v/^(5' U {v})) 
4: 5 = SU{w} 
5: end for 
6: return 5 



Theorem 4.2. For the CLT model, ctnir(S) is monotone 
and submodular. 



Proof. By Lemma 4.1 we know that the CLT model 
is equivalent to the random live-path graph model. By 
Lemma [43] we know that for each live-path graph, the size 
of the influence blocking set is monotone and submodular. 
Since ctnir(S) = Eg x (\IBS(S)\) and any convex com- 
binations of monotone and submodular functions are still 
monotone and submodular, we know that ctnir(S) is mono- 
tone and submodular. □ 

We have shown that the influence blocking maximiza- 
tion problem under CLT model is monotone and submodular. 
Moreover, we have ctnir(9) = 0. Then by the famous result 
in l20lL the greedy algorithm given in Algorithm [T] achieves 
1 — 1/e approximation of the optimal solution. The algo- 
rithm simply selects seed nodes one by one, and each time 
it always selects the node that provides the largest marginal 
gain to the negative influence reduction. 

However, the greedy algorithm requires the evaluation 
of ctnjr(S), which cannot be done efficiently. The standard 
way of using Monte-Carlo simulations to estimate ctnir(S) 
is slow, especially when we need to simulate the interfering 
propagation of competing influences. Even with powerful 
optimization method such as the lazy forward optimization 
of n~8l or more advanced approach in [6], greedy algorithm 
still takes unacceptable long time for large graphs of more 
than 10k nodes. We address this efficiency issue in the next 
section with our new algorithm CLDAG. 

5 CLDAG Algorithm for the IBM Problem 

Motivated by the extremely low efficiency of greedy al- 
gorithm, we try to tackle this problem with an innovative 
heuristic approach proposed by Chen et al. in (3 [3. This 
heuristic is characterized (a) by restricting influence compu- 
tation of a node v to its local area to reduce computation cost; 
and (b) by carefully selecting a local graph structure for v to 
allow efficient and accurate influence computation for v un- 
der this structure. For the LT model, Chen et al. use a local 
directed acyclic graph (LDAG) structure |7], because it al- 
lows linear computation of influence in a LDAG, as well as 
efficient construction of LDAGs using an algorithm similar 
in style to the Dijkstra's shortest path algorithm. We repeat 
the LDAG construction algorithm of (7) in our Algorithm [2] 



Algorithm 2 Find-LDAG(G,t;,0),compute LDAG for v with 
threshold 

1: X = 0;F = 0;W e V, Inf(u, v) = 0;lnf(v, v) = 1 

2: while msiX veV \x I n f( u : v) > do 

3: x = arg max nG y\x Inf(u, v) 

4: Y = YU{(x,u)\ueX} 

5: X = Xl){x} ' 

6: for each node u G N in (x) do 

7: Inf(u, v) += w ux * Inf(x, v) 

8: end for 

9: end while 

10: return D = (X, Y, w) as the LDAG(v,0) 



for completeness. We use N in (x) to denote the set of in- 
neighbors of node x. The 6 in the algorithm is a threshold 
from to 1 controlling the size of the LDAG — the smaller 
the 0, the larger the LDAG. The algorithm includes a node x 
only if its influence to v through the LDAG edges are at least 
0. The key update step in line|7]is based on the important lin- 
ear relationship of activation probabilities in DAG structures 
shown in [7 ], and repeated below: 

(5.1) ap(x) = ^2 w ^x ' a>p(v>), 

ueN in (x) 

where ap(x) is the activation probability of node x when a 
seed set is fixed. 

However, for the CLT model, negative and positive in- 
fluence are propagated concurrently in the network and in- 
terfere with each other. Thus we need to adjust our LDAG 
construction and influence computation for the CLT model. 
First, for each node v, we use Algorithm [2] to construct 
two LDAGs, LDAG + (v) and LDAG~(v), using positive 
weights and negative weights respectively. Second, we 
need to carefully compute the positive activation probabil- 
ity ap + (v) and negative activation probability ap~(v), for 
any node v under the CLT model, assuming positive and 
negative influence are propagated through LDAG + (v) and 
LDAG~ (v) respectively. This involves a dynamic program- 
ming formulation detailed in the following subsection. 

5.1 Influence computation. We propose a dynamic pro- 
gramming method, Inf-CLDAG, to compute the exact ac- 
tivation probability of the central node v in local structure 
LDAG + {v) and LDAG~(v). Under the CLT model, two 
opposite influence diffusions correlate together when dis- 
seminating in the graph, which makes it more tricky than 
the computation in the origin LT model. In this case, num- 
ber of steps taken to activate a node becomes an important 
factor that must be taken into consideration when computing 
the cascade result. 

For the following computation, we assume that the 
positive seed set S and the negative seed set Af are fixed, and 



influence to v only diffuses in LDAG + (v) and LDAG~ (v). 
For the IBM problem, we want to compute the negative 
influence reduction under the positive seed set S. It is 
essentially a computation of negative influence coverage, 
which is given by J2 V ap~ (v). 

Let P + (v,t) be the probability that the summation 
of the positive weights of in-edges of positively activated 
neighbors of node v exceeds its positive threshold exactly 
at time t, and similar for P~(v,t). Let ap + (v,t) be the 
probability that v becomes positively activated exactly at 
time t, and similar for ap~(v,t). Then we have ap + (v) = 
J2t ap^(v,t) and ap~(v) = J2 t ap~(v,t). We now show 
how to compute ap + (v, t) and ap~ (v, t). 

By the definition of the CLT model, we have the follow- 
ing for any v G V \ (S U Nq) and any t > 1: 

(5.2) P+(v,t) = T,ueLDAG+(v) W uv a P + ( u ^ ~ 1), 

(5.3) P~{v,t) = EueLDAG-(v) W uv a P~( u ^- 1 )^ 

(5.4) ap+(v,t) = P+M)(l - El=oP-(v,k)), 

(5.5) ap-(v, t) = P~(v, t)(l - Et=o P + {v, k)). 



Algorithm 3 Inf-CLDAG(v, LDAG + (v), LDAG~(v),S, N ) 



Equations ( |5.2| ) and ( |5.3| ) can be reached by subtracting 
the probability that the summation of the weights of in-edges 
of activated neighbors of node v exceeds threshold in any 
round from to t — 1 from the corresponding probability 
for rounds from to t. Equation ( |5.4| ) is derived from the 
fact that if a node v becomes positively activated at round t, 
then exactly at round t the summation of positive weights 
must exceed the positive threshold, while by round t the 
summation of negative weights does not exceed the negative 
threshold (otherwise v would be negatively activated). The 



case for Equation ( |5.5| ) is similar. 

The boundary conditions of the above equations are (a) 
for v G S, ap+(v,0) = l,P+(v,0) = 0, P + (v,t) = 
ap+(v,t) = OforalU > 1, P~(v,t) = ap~(v,t) = for 
all t > 0; (b) for v G N , ap~(v,0) = l,P~(v,0) = 
P~(v,i) = ap-(y,i) = for all t > 1, P + (v,t) = 
ap+(v,t) = for all t > 0; and (c) for v S U N , 
P + (v,Q) = ap+(v,0) = P~(v,0) = ap~(v,Q) = 0. 
From the above equations together with the boundary con- 
ditions, the dynamic programming algorithm can be applied 
to compute the exact activation probability for every node 
v. However, the naive implementation will take 0(mD^-D) 
time, where is the size of LDAG + (v) and LDAG~ (v) 
and £d is the length of the longest path in LDAG + (v) and 
LDAG~ (v). With a careful planning, as described below, 
we could reduce the time to 0(ra£>) instead. 

Algorithm [3] provides the pseudocode for our algo- 
rithm Inf-CLDAG, which computes the negative influ- 
ence ap~(v) to v from positive seed set S and nega- 
tive seed set iV , through v's LDAGs LDAG + (v) and 
LDAG~ (v). The key feature of the algorithm is the alter- 
nating breadth-first- search (BFS) traversal on LDAG~(v) 



+ := SDV(LDAG+(v)) 
o := N nV(LDAG-(v)) 



initialize ap + (u, i), ap~(u, i), P + (w, t), P~(u, t) for 
all u and t to or according to the boundary condition 
// can do initialization just when needed, so no extra time 
needed 
set t = 

while Qf ^ or Q~ / do 
for all node u in do 

for all node x in LDAG~(v) and w~ x ^ and 
x S U N do 

add node x into Qi +1 

P~(x,t + 1) = P-(x,t + l)+w- x ap-(u,t) 
end for 
end for 

for all node x in Q^ +1 do 

ap~(x,t+i) = P-(x,t+l)(l-Y? k=0 P + (x,k)) 
end for 

for all node u in Qf do 

for all node x in LDAG + (v) and wf x ^ and 
x S U N do 

add node x into Qf +1 

P+(x,t+l) = P+(x,t + l)+w+ x ap+{u,t) 
end for 
end for 

for all node x in Qt+i do 

ap+ (x, t + 1) = P+ (x, t + 1) (1 - Elto P~ (*, k)) 
end for 
set t = t + 1 
end while 

ap~(v) = J2t a P~( v ^) 
return ap~(v) 



and LDAG + (v). Starting from the negative seed set we do 
one step BFS in LDAG~(v) and compute P~(x, l)'s and 
ap~(x, l)'s for those traversed nodes. We then do one step 
BFS in LDAG + (v) from the positive seeds, and compute 
P + (x, l)'s and ap+(x, l)'s for the traversed nodes. We then 
go back to LDAG~(v) to do one more layer of BFS and 
then go back to LDAG + (v) for one more layer of BFS, and 
so on. With this setup, we only need one BFS traversal of 
LDAG + (v) and LDAG~(v) to compute all ap~(u, t)'s, and 
thus save the running time to 0(ra£>). 

As an example, we show the computation for the struc- 
ture of LDAG + (v) and LDAG~(v) of Figure [l] In the 
example, d is the only positive seed while a and e are two 
negative seeds. In initialization, ap + (d 1 0), ap~(a,0) and 
ap~(e, 0) are set to 1 and all other values are set to 0. In 
the first iteration, we start from the negative seeds a and e 
to do one level BFS traversal in LDAG~ (v), and thus com- 




V V 



LDAG + (v) LDAG-(v) 

Figure 1 : A simple example of Inf-CLDAG algorithm (red 
node d is the positive seed and blue nodes a and e are 
negative seeds). 



pute P~(b, l),P~(c, 1), P~(v, 1), ap~(b, l),ap~(c, 1) and 
ap~(v,l). Next we go to LDAG + (v) and do one level 
BFS traversal starting from the positive seed d, and com- 
pute ap+(f, l),ap + (c, 1) and ap + (v, 1), which use the val- 
ues P~(c, 1) and P~(v, 1) computed. Then we start the 
second iteration, which is second level BFS traversal in 
LDAG~ (v), and this only gives us node v, for which we 
compute ap~(v,2). We will do another BFS traversal on 
LDAG + (v), and then we find that the BFS traversal has 
reached all nodes in both LDAGs and the computation fin- 
ishes. 

5.2 CLDAG algorithm. Once we have the computation 
of negative influence reduction for any seed set as given in 
Algorithm [3j we can plug it into the greedy algorithm for 
positive seed selection. We call this algorithm CLDAG. We 
will present the complete pseudocode description of CLDAG 
algorithm here as Algorithm [4] 

The algorithm contains an initialization part and an 
iteration part. In initialization(line 3-5), we construct 
LDAG + (v) and LDAG~(v) for all nodes v. We also main- 
tain an auxiliary set OutLS + (v), which is the set of nodes to 
which v may have positive influence, i.e., u G OutLS + (v) 
if and only if v G LDAG + (u). Since positive seed set is 
changing in the algorithm, we use ap~(v,S) to represent 
the negative activation probability of v in its LDAGs under 
positive seed set S. Then, for each node u G LDAG + (v), 
we compute the incremental influence reduction ap~ (v, 0) — 
ap~(v,{u}) when adding u G LDAG + {v) as a positive 
seed, and sum them up for each u to get Declnf {u), the 
overall incremental influence reduction of node u. 

In the main iteration(line 16-29), we iterate k times 
to select k seeds. In each iteration, we select a new seed 
s with the largest Declnf(s). Once s is selected, other 
nodes' Declnf (u) may need to be updated. Since s may 
positively influence all nodes in OutLS + (s), thus all nodes 
u G LDAG + (v) with v G OutLS + (s) needs to update 
their Declnf(u). Note that here we take advantage of the 
local DAG structure, so that we do not need to update the 



Algorithm 4 CLDAG(G, k, N , 0) 

1: S = 

2: //Build local structure 
3: for all node v G V do 

4: Build LDAG + (v),LDAG~(v),OutLS+(v) with 

threshold 
5: end for 
6: //initialize Declnf 

7: set Declnf (v) = for all node v G V 

8: for all node v G V and v ^ Nq do 

9: for all node u G LDAG + (v) do 
10: before = ap~(v,0) 
11: after = ap~(v,{u}) 
12: Declnf (u) += before — after 
13: end for 
14: end for 
15: //Main Loop 
16: for i = 1 to k do 

17: s = arg max vG y\ ((SuA r o) Declnf (v) 
18: //update influence reduction 
19: for all node v G OutLS + (s) do 
20: for all node u G LiMG+O) do 
21: //subtract previous incremental influence reduc- 

tion 

22: Declnf '(u) — = ap~ (v, S U {u}) — ap~ (v, S) 
23: //add up new incremental influence reduction 

24: Declnf (u) += ap~(v, SU{^, s})-ap~(v, SU 

{4) 

25: end for 
26: end for 

27: //add node 8 as positive seed 
28: S = SU{s} 
29: end for 
30: return S 



incremental influence reduction of every node in the graph. 
The update is done by using Algorithm [3] 
Complexity Analysis. Let n = \V\, ni\ e = 
max v \LDAG^(v)\, m~ e — max v \LDAG~ (v)\, and ri£ e = 
max v \OutLS + {v)\. Let tf e and t~[ e be the time of effi- 
cient construction of LDAG + (u)'s and LDAG~ (w)'s, re- 
spectively. Note that = 0{tf e ) and = O(t^), and 
for sparse graphs, efficient Dijkstra shortest path algorithm 
implementation could make and close to the order of 

and m~ e . We first analyze the complexity of storing all 
LDAG structures. 

In the initialization step, we need to compute 
LDAG + (vy& and LDAG~(v)^ for all nodes, and thus it 
takes 0(n(t^ e + t^ e )) time. We use a max-heap structure to 
store Declnf '(u)'s, and it takes O(n) time to initialize. The 
Declnf (u) computation by Algorithm p] takes 0(n(m^ + 
mw)) time. Overall, initialization takes 0(n(t^~ e +t~ e )) time. 



For the iteration step, each iteration needs to update 
DecInf(uYs for at most ri^ e m~l e nodes, and each update 
involves influence computation by Algorithm [3] which takes 
0( m ie + m 7e) ti me > P ms updating Declnf(u) on the max- 
heap, which takes O(logn) time. Therefore, the iteration 
step takes 0(&n^ra^(m^ + m~ e + logn)) time. 

Hence the total time complexity of the algorithm is 

°( n (4e + tTe) + kn te m te( m te + m *0 + lo S n ))- 

For space complexity, we store all LDAGs and 

OutLS+(v)'s, so the space complexity is 0(n(m^ e -\-m~ e + 
ri^o)). In actual implementations one may not afford to store 
all the LDAG structures (as in our implementation), so an al- 
ternative is to store only OutLS^ (v)'s and compute LDAGs 
whenever needed. It is easy to see that in this case, the time 
complexity is 0(n(tj e + t~ e ) + /cn+ ra+ (t+ + t# + log n)), 
which is not significantly worse than storing LDAGs, while 
the space complexity is reduced to 0(nn^ e ). 

6 Experiments 

To test the efficiency and effectiveness of CLDAG for influ- 
ence blocking maximization problem under the CLT model, 
we conduct experiments on three real-world datasets as well 
as synthetic networks. 

6.1 Experiment setting The three real- world datasets are 
mobile network and collaboration networks. The mobile 
network is a graph derived from a partial call detailed record 
(CDR) data of a Chinese city from China Mobile, the largest 
mobile communication service provider in China. In the 
mobile network, every node corresponds to a mobile phone 
user and the edges correspond to their phone calls between 
one another. We use the number of calls between two 
users as the edge weight and normalize it among all edges 
incident to a node (the edge thus becomes directed with 
asymmetric edge weights). The NetHEPT and NetPHY 
are both collaboration networks extracted from the e-print 
arXiv (http://www.arXiv.org ). The former is extracted from 
the "High Energy Physics - Theory" section (form 1991 to 
2003), and the latter is extracted from "Physics" section, 
and both are the same datasets used in [6]. The nodes in 
both networks are authors and an edge between two nodes 
means the two authors coauthored at least one paper. We 
use the number of coauthored papers as the edge weight and 
normalize it among all edges incident to a node. Some basic 
statistics of these networks are shown in Table [T] 

The edge weights described above do not differentiate 
between positive and negative weights yet. To differentiate 
them and study the effect of different diffusion strength 
for positive and negative diffusions, we introduce positive 
propagation rate p + and negative propagation rate p~ , both 
of which are values from to 1. We multiply edge weight 
with p + and p~ of each edge to obtain its positive and 
negative edge weight, respectively. The effect is that all 



Table 1 : Statistics of the three real- world networks. 



Dataset 


Mobile 


NetHEPT 


NetPHY 


Node 


15.5K 


15.2K 


37.1K 


Edge 


37.0K 


58.9K 


231.5K 


Average Degree 


4.77 


7.75 


12.48 



positive edge weights of in-edges of a node sums up to 
p + , and thus with probability 1 — p + the node will not 
be activated even if all of its in-neighbors are positively 
activated. The case for p~ is similar. 

We compare the performance of the following algorithm 
and heuristics: 

• CLDAG: Our CLDAG algorithm with = 0.01 Q 

• Greedy: Algorithm [T] under the CLT model with the 
lazy-forward optimization of fTSl , and 10000 simula- 
tion runs for each influence estimate. 

• Degree: a baseline heuristic, simply choosing nodes 
with largest degrees as positive seeds. 

• Random: a baseline heuristic, simply choosing nodes at 
random as positive seeds. 

• Proximity Heuristic: A simple heuristic under which 
we choose the direct out-neighbors of negative seeds as 
positive seeds to block the negative influence. Among 
these direct out-neighbors, we sort them by the negative 
weights of their in-edges connecting them with negative 
seeds, and select the top k nodes as the positive seeds. 

Proximity heuristic introduced above is based on the 
simple idea of trying to block the influence of negative 
seeds at their direct neighbors. It should be noticed that 
the proximity heuristic can be considered as a simplified 
version of our CLDAG algorithm. In fact, for each node 
v, if we construct its LDAG + (v) to be only the node v 
itself, while its LDAG~ (v) to be v itself if v has no in- 
neighbors in the negative seed set 7V , or else to be v with 
one of v's in-neighbors in No with the largest negative 
edge weight to v. It is easy to verify that our CLDAG 
algorithm under these LDAG structures exactly matches the 
proximity heuristic. Therefore, proximity heuristic can be 
treated as an intermediate algorithm between the baseline 
random algorithm and the full-blown CLDAG algorithm, and 
is helpful for understanding the features of CLDAG. 

Since the CLT model is a probabilistic model, when 
we evaluate the blocking effect for any given positive and 
negative seed sets, we test it for 1000 times and take their 
average as the result. The negative seeds in N are chosen 
either randomly or from nodes with the largest degrees. The 
scalability test is run on Intel Xeon E5504 2G*2 (4 cores for 
every CPU), 36G memory server, while all others are run on 



! We found that 9 < 0.01 will not have significant improvement for the 
blocking effect, for all networks tested. 
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Figure 2: Experiment result of comparison with Greedy 
algorithm. 



Dell D630 laptop with 2G memory. All experiment code is 
written in C++. 

6.2 Results with the greedy algorithm. We first run tests 
that include the greedy algorithm. Since the greedy algo- 
rithm runs very slow on large graphs, we extract two sub- 
graphs from the datasets for comparison. One subgraph is a 
1000 node graph extracted from the mobile network, and an- 
other is a 5000 node graph extracted from the NetHEPT net- 
work. The extraction is done by randomly selecting a node 
in the graph and doing BFS from the node until we obtain the 
desired number of nodes, and we include all edges for these 
nodes in the subgraph. We choose 50 nodes with the high- 
est degrees as negative seeds and select 200 positive seeds 
to block their influence. Both p + and p~ are set to 1. The 
experiment result are showed in Figure [2] 

From Figure [2] (a) and (b), we can see that the CLDAG 
algorithm consistently matches the performance of the 
greedy algorithm for both datasets. In the 1000-node mobile 
network test, CLDAG significantly outperforms the Proxim- 
ity heuristic, e.g., when CLDAG completely blocks all nega- 
tive influence with 130 seeds, proximity heuristic still allows 
negative influence to reach about 30 more nodes. In term of 
negative influence reduction, this is (120 — 50) /(120 — 80) = 
175% improvement. In the 5000-node NetHEPT dataset, 
proximity heuristic performs as well as CLDAG and the 
greedy algorithm. In both cases, random and degree heuris- 
tic perform badly, essentially having no blocking effect at 
all. This is in contrast with degree heuristic result for influ- 
ence maximization reported in the previous papers (6j[5]|7]|, 
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Figure 3: Experiment result on algorithm scalability. 



where degree heuristic still have moderate gain when select- 
ing more seeds. Our interpretation is that for influence block- 
ing maximization, knowing where the negative seeds are be- 
comes very important, and thus proximity heuristic could be- 
have reasonably well while degree heuristic oblivious to the 
location of negative seeds becomes useless. 

From Figure [2] (c), we see that CLDAG is much faster 
than the greedy algorithm, with more than two orders of 
magnitude speedup. With 5000 nodes, the greedy algorithm 
already takes more than five hours, while CLDAG only takes 
one minute to select 200 seeds. 

We further compare the scalability of CLDAG with the 
greedy algorithm. For this test, we use a family of synthetic 
power-law graphs generated by the DIGG package (9). We 
generate graphs with doubling number of nodes, from 0.2K, 
0.4K, up to 6.4K, using power-law exponent of 2.16. Each 
size has 10 different random graphs and our running time 
result is the average among the runs on these 10 graphs. We 
randomly choose 50 nodes as negative seeds and find 50 
positive seeds to block the negative influence. We set both 
p + andp - to 1. The scalability result is shown in Figure [5] 

The result clearly shows that CLDAG is two orders of 
magnitude faster than the greedy algorithm and its running 
time has linear relationship with the size of the graph, 
which indicates good scalability of the CLDAG algorithm. 
Therefore, comparing with the greedy algorithm, CLDAG 
matches the blocking effect of the greedy algorithm while 
has at least two orders of magnitude speedup in running time. 

6.3 Results on larger dataset without the greedy algo- 
rithm. We conduct experiments on the full graphs of the 
three datasets, but we do not include the greedy algorithm 
since its running time becomes too slow. The initial negative 
seeds are chosen either randomly or with highest degrees. 
We first set p + and p~ to 1 . 

As shown in Figure [4] (a) to (f), the performance of 
CLDAG strictly dominates the proximity heuristic in all 
cases. For random negative seed selection, the negative 
influence reduction of CLDAG is on average 78.24% higher 
than that of the proximity algorithm (percentage taken as the 
average of results from 1 seed to 200 seeds). For max-degree 
negative seed selection, CLDAG improves the performance 
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(g) Running Time of CLDAG algorithm on real networks 

Figure 4: Experiment result of CLT model on three real 
dataset. We choose 200 negative seeds with max degree 
in experiment (a),(c),(e) and 400 random negative seeds in 
experiment (b),(d),(f). 



of proximity heuristic even more, for 80.75% on average. 
Degree and random heuristic still show no blocking effect on 
all test cases. The running time of CLDAG is consistently 
low, as shown in Figure [4] (g). The results demonstrate 
that across all networks and all negative seed selection 
methods, CLDAG has consistently good performance in 
negative influence reduction over other heuristics, and it 
achieves this good performance efficiently. 

Next, we vary propagation rate p + and p~ to check their 
effect on influence dissemination and the performance of our 



Figure 5: Experiment result of CLT model on propagation 
rate p + and p~ . 



algorithm. For simplicity, we only present experiment result 
on the NetHEPT network. We choose 200 nodes with max 
degree as negative seeds and select 200 positive nodes to 
block their influence. In one test we have p + = 0.5 and 
p~ — 1, and thus negative influence diffusion is stronger, 
while in the second test, we use p + = 1 and p~ = 0.5, 
making positive influence diffusion stronger. 

Figure [5] reports our simulation results. First, as ex- 
pected, when the negative influence is stronger, more nodes 
become negative without positive influence (1350 nodes vs. 
560 nodes in our two test cases). More importantly, we see 
that our CLDAG algorithm performs much better than the 
proximity heuristic when the negative influence is stronger 
(Figure [5] (a)). This is because in this case negative diffu- 
sion can traverse long paths and thus simply placing positive 
seeds next to the negative seeds may not block the negative 
diffusion well. On the other hand, when the negative influ- 
ence is weak (Figure [5] (b)), negative influence could be ef- 
fectively blocked by placing positive seeds next to them, and 
thus proximity heuristic performs close to CLDAG. 

To summarize, our results show that CLDAG has the 
best performance among tested heuristics across all graphs, 
and especially when negative influence diffusion is strong. 
Proximity heuristic as a simplified version of CLDAG has 
reasonable performance in a few cases especially when 
negative influence diffusion is weak, and can be used as a 
fast alternative to CLDAG in this case. However, there are 
situations in which proximity heuristic is significantly worse 
than CLDAG. Traditional degree heuristic cannot be used for 
influence blocking maximization at all from our test results. 

6.4 Effectiveness of influence blocking at different neg- 
ative seed size. Finally, we test the effectiveness of influ- 
ence blocking with CLDAG, when the size of negative seeds 
increases. We vary the negative seed size from 1 to 1000, 
and see how many positive seeds are required by CLDAG 
to reduce negative influence to 10%. We cap the number of 
positive seeds at 1000. For this test, we use the NetHEPT 
network, select negative seeds with largest degrees, and set 



Table 2: Result on the effectiveness of influence blocking 
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p + = p~ = 1. The results are shown in Table [2] where 
ctn(S,No) denotes the expected number of negative activa- 
tions with positive seeds S and negative seeds No . 

The result shows that it requires about 20 to 30 times 
of positive seeds to reduce negative influence to about 10% 
level, and it becomes increasingly hard to block negative 
influence. For example, with 1000 negative seeds, we spend 
an equal number of 1000 positive seeds but can only reduce 
17% negative influence. Therefore, first mover has a clear 
advantage, and the best way to block negative influence is 
before it becomes pervasive. 

7 Conclusion and Discussions 

In this work, we study influence blocking maximization 
problem under the competitive linear threshold model. We 
show that the objective function of the IBM problem is 
submodular under the CLT model, and thus the greedy 
approximation algorithm is available. We then design an 
efficient algorithm CLDAG to overcome the slowness of the 
greedy algorithm. Our simulation results demonstrate that 
CLDAG matches the greedy algorithm in the blocking effect 
while significantly improving running time. CLDAG also 
outperforms other heuristic algorithms such as proximity 
heuristic that selects direct neighbors of negative seeds, 
showing that CLDAG is a stable and robust algorithm for 
the IBM problem. 

Finally, we compare two closely related results in the 
literature, which showing some interesting subtleties in com- 
petitive influence diffusion. First, in [3 ], Budak et al. study 
the IBM problem for the extended IC model. They show, 
however, that when we extend the IC model to allow positive 
and negative diffusions having two set of different param- 
eters, the IBM is not submodular. This indicates a subtle 
difference between different diffusion models. In this sense, 
CLT model is more expressive, since it is easier to model 
different diffusion strength in the CLT model and see its ef- 



fect, as we did in our evaluation (Figure [5]). They also show 
that when restricting the positive weights to be 1, or to be the 
same as negative weights, the problem becomes submodular. 
For these cases, we are able to design efficient algorithms 
close to MIA and MIA-N of OH), an d our simulations re- 
sults are similar when comparing with the greedy algorithm 
and other heuristics, but we do not report them here. 

Second, in [2], Borodin et al. propose several competi- 
tive diffusion models extended from the LT model. In partic- 
ular, their separate threshold model is essentially the CLT 
model in this paper (with a slightly different tie-breaking 
rule). Interestingly, they show that the problem of maxi- 
mizing positive influence given a fixed negative seed set is 
not submodular (applicable to our CLT model), while we 
show here that influence blocking maximization is submod- 
ular. Intuitively, this is because even though a positive seed 
x blocks the negative influence, to maximize positive influ- 
ence it may also need other positive seeds to activate nodes 
that are blocked from negative influence by node x. There- 
fore, the marginal gain of x is larger for the positive influence 
maximization objective when there are other positive seeds 
corporating with x, making it not submodular. 

Several improvements and future directions are possi- 
ble. One direction is looking into even faster and more 
space-efficient algorithms for influence blocking maximiza- 
tion. Another direction is to tackle the IBM problem in 
other competitive diffusion models, especially models with- 
out submodularity property. 
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Appendix 

A Proof of TheoremBU 

Proof. Consider an instance of the NP-complete vertex 
cover problem defined by an undirected | F|-node graph G = 
(V,E) and an integer k. The vertex cover problem asks if 
there exists a set S of k nodes in G so that every edge has at 
least one endpoint in S. We show that this can be reduced to 
the IBM problem under the CLT model. 

Given an instance of the vertex cover problem involving 
a graph G, let dc{v) denote the degree of node v in G and 
define d m = max vG y dc(v). We construct a corresponding 
instance of the IBM problem as follows. We first construct 
a directed graph G' from G. Besides the original graph 
structure, for each vertex V{ in G, we build a spindle structure 
Si with \V\ + 2 nodes and a chain d with i + 1 nodes, and 
Si and Ci share the node Vi . We use a toy example given in 
Figure [6] to describe our construction. 



the total size of constructed graph G f is 0(\V\ 2 ). 



We first show Lemma A. 1 for our NP-hardness proof. 





Figure 6: Graph construction for the NP-hardness proof 
(positive edges are red, and negative seeds and negative 
edges are blue). 

As shown in the Figure [6j each spindle structure con- 
sists of a top node, \V\ intermediate nodes and a bottom 
node. The top node is chosen as a negative seed and has \V\ 
negative edges (meaning positive weight is 0) with negative 
weight 1 to each of the intermediate nodes. Each interme- 
diate node has a negative edge with ^ negative weight to 
the bottom node. Then we use each bottom node of all spin- 
dle structures to form a similar graph as G except that we 
direct all edges of the origin G in both directions to build 
positive edges (meaning negative weights are 0). The posi- 
tive weights of positive edges are set according to the degree 
of the according node in G. Namely for the bottom node 
Vi of spindle structure Si, we set all the weights of positive 
in-edges of Vi equally to d J^ v .\ • Next, starting from Vi, we 
add a chain with £ + 1 nodes (including Vi) and i directed 
negative edges of weight 1. We set i = [ jypj — 11 . Thus 



LEMMA A. 1. In the constructed graph G given positive 
seed set S if there exists a bottom node v in a spindle 
structure whose positive activation probability at step 1 is 
not strictly 1, a higher negative influence reduction with one 
more positive seed can be achieved by choosing v instead of 
selecting any other intermediate node in spindle structure or 
any node in chains. 

Proof. We assume that the positive activation probability for 
node v at step 1 is p + . Firstly, it is obvious that choosing 
bottom node is a better strategy than choosing node in any 
chain. Then by adding node v to the positive seed set, we can 
have a negative influence reduction A v > (1 — p + )(l + 1). 
By adding any intermediate node to positive seed set, we can 
have A inter < i + ^(i-p+)(£+i). Withf = 
we can easily get A v > (l-p+)(i+l) > l+j^(l-p+)(i+ 
1) > Winter- Therefore choosing bottom node v will always 
lead to greater gain in negative influence reduction than any 
other intermediate or chain nodes. □ 

If there is a vertex cover S of size k in G, then one can 
deterministically make ctnir(S) = \V\(£ + 1) by choosing 
the positive seed set as the vertex cover of graph G. Since 
without the positive seeds all nodes in G' will be negatively 
activated, while with positive seed set S we can save the 
bottom nodes and also the nodes on the chains. Conversely 
this is the only way to get a set with ctnir(S) > \V | {I + 1). 
Otherwise if positive seeds among bottom nodes are not a 
vertex cover of the origin graph G, the probability that all 
bottom nodes can be positively activated in step 1 is strictly 
less than 1, and the gap is at least l/d m . According to 



Lemma A. 1 all k positive seeds must be chosen among the 
bottom nodes. Thus, in step 2 any node that was not positive 
in step 1 must become negative, due to negative influence 
dominance. Hence, we have ctnir(S) < (\V\ — 1)(£ + 1) + 
(1 - + 1) < \V\(e + 1). Therefore, by checking if 

G' has a positive seed set of size k that achieves negative 
influence reduction of at least | V\ {£ + 1), we can know if the 
original graph G has a vertex cover of size k. □ 



